Improving Software Pipelining by Hiding Memory Latency with Combined Loads and Prefetches

نویسندگان

Michael Bedy

Steve Carr

Soner Önder

Philip Sweany

چکیده

Modern processors and compilers hide long memory latencies through non-blocking loads or explicit software prefetching instructions. Unfortunately, each mechanism has potential drawbacks. Non-blocking loads can significantly increase register pressure by extending the lifetimes of loads. Software prefetching increases the number of memory instructions in the loop body. For a loop whose execution time is bound by the number of loads/stores that can be issued per cycle, software prefetching exacerbates this problem and increases the number of idle computational cycles in loops. In this paper, we show how compiler and architecture support for combining a load and a prefetch into one instruction, called a prefetching load, can give lower register pressure like software prefetching and lower load/storeunit requirements like non-blocking loads. On a set of 106 Fortran loops we show that prefetching loads obtain a speedup of 1.07–1.53 over using just non-blocking loads and a speedup of 1.04–1.08 over using software prefetching. In addition, prefetching loads reduced floating-point register pressure by as much as a factor of 0.4 and integer register pressure by as much as a factor of 0.8 over non-blocking loads. Integer register pressure was also reduced by a factor of 0.97 over software prefetching, while floating-point register pressure was increased by a factor of 1.02 versus software prefetching in the worst case.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Software Pipelining by Hiding Memory Latency

متن کامل

A Software Pipelining Method Based on a Hierarchical Social Algorithm

Software pipelining is a compile-time scheduling technique that overlaps successive loop iterations to achieve instruction-level parallelism. It allows us to hide memory latency by overlapping the prefetches for a future iteration with the computation of the current iteration. This paper presents an efficient algorithm for determining the iteration bound of cyclic data flow graphs and the optim...

متن کامل

Effects of Main Memory Latencies on the Performance of Nonblocking Caches

Lockup-free caches in conjunction to non-blocking processor loads have been proposed to hide miss latencies in high performance processors. One problem with current approaches is the increased complexity of the processor and of the cache controller due to non-blocking. In this paper, we introduce a simple mechanism to support non-blocking loads and a lockup-free cache. A modified SPARC architec...

متن کامل

A comparative evaluation of software techniques to hide memory latency

Software oriented techniques to hide memory latency in superscalar and superpipe2ined machines include loop unrolling, software pipelining, and software cache prefetching. Issuing the data fetch request prior to actual need for data allows overlap of accessing with useful computations. Loop unrolling and software pipelining do not necessitate microarchitecture or instruction set architecture ch...

متن کامل

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches

Aggressive hardware-based and software-based prefetch algorithms for hiding memory access latencies were proposed to bridge the gap of the expanding speed disparity between processors and memory subsystems. As smaller L1 caches prevail in deep submicron processor designs in order to maintain short cache access cycles, cache pollution caused by ineffective prefetches is becoming a major challeng...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Improving Software Pipelining by Hiding Memory Latency with Combined Loads and Prefetches

نویسندگان

چکیده

منابع مشابه

Improving Software Pipelining by Hiding Memory Latency

A Software Pipelining Method Based on a Hierarchical Social Algorithm

Effects of Main Memory Latencies on the Performance of Nonblocking Caches

A comparative evaluation of software techniques to hide memory latency

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches

عنوان ژورنال:

اشتراک گذاری